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Abstract 



0^ . In using data assimilation to import information from observations to estimate parameters 
and state variables of a model, one must assume a distribution for the noise in the measure- 
ments and in the model errors. Using the path integral formulation of data assimilation [ij , 
we introduce the idea of self consistency of the distribution of stochastic model errors: the 
distribution of model errors from the path integral with observed data should be consistent 
with the assumption made in formulating the path integral. The path integral setting for 
data assimilation is discussed to provide the setting for the consistency test. Using two 
2 \ examples drawn from the 1996 Lorenz model, for D = 100 and for D = 20, we show how 
CN I one can test for this consistency with essentially no additional effort than that expended in 
CN ■ extracting answers to interesting questions from data assimilation itself. 
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^ ' 1. Introduction 

Assimilating information from observed data to models of the observed system requires 
a formulation of the way in which the data communicates information to the model and 
a formulation of the manner in which the model propagates states between observations. 
In the former, one must provide an approximation to the conditional mutual information 
between the model state and the observation at each observation time. For the latter, one 
must provide an approximation to the model of the dynamical processes along with the 
errors in the model. As one has some control over errors in measurements and models of the 
measurement, we focus here on errors in the models and how they are represented. 

There are two general types of errors in models: (a) deterministic errors associated 
with dynamical processes absent from the model, and (b) stochastic errors associated with 
the resolution of the model as discretized in space and time or associated with noise that 
obscures perfect resolution. Unresolved processes at small space/time scales usually fall 
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into the second category. One of the goals of data assimilation is to use information from 
measurements to estimate unknown constant parameters in a model, so we do not consider 
unknown values of fixed parameters as model errors. 

In the next section we will be more precise about models and observations. For the 
discussion at this point, designate the D-dimensional state of the model at time t„ as x(t„) = 
x(n). A dynamical model in discrete time (if partial differential equations underlie the 
observations, spatial variables have also been discretized) will give a rule for taking the 
state of the system x(n) to the state at a later time tn+i, x(?7, + 1), of the form g(x(n + 
1), x(?7,), p^) = 0, where a set of fixed parameters for the model p^j is indicated. If the 
connection between x(n) and x(n + 1) is explicit, we would use g(x(n),x(n + l),pj^) = 
x(n + 1) — f(x(n),pjyj) for the model error. In what follows we use these two expressions 
interchangeably. 

Errors of type (a) above can be designated as corrections to a deterministic model by 
writing x(n + 1) = f(x(?T,)) + Af(x(n). This gives us little information on how the model 
error is to be represented as the correction Af(x(n)) to the vector field f(x(n),p^) is so 
general. Indeed we know of no algorithmic manner in which to select such errors in the 
model. 

Stochastic errors lead away from a deterministic model and require our focus on the 
probability of transition to x(n+ 1) given x(n): P(x(?2 + l)|x(n)). In a deterministic setting 
P(x(n + l)|x(n)) = 5^(x(n + 1) - f(x(n),PA/)) = 5^(g(x(n + 1), x(n), p^)), and in the 
presence of stochastic model error the delta function is broadened. How it is broadened 
requires an assumption, and in this paper we propose a systematic manner to examine that 
assumption in the case where data from observations are available. Absent observations, any 
assumption about model errors is consistent, and, in fact, it prescribes the model itself. 

In the next section we recall the general formulation of data assimilation in situations 
where there are noisy observations, model errors, and uncertainty in the state of the model 
when observations begin at t^, -PfxfO)). This gives us a path integral over model states 
through an observation window [l|, [l, H, 0] , yielding the conditional distribution of model 
states, from which expected state and parameter values can be evaluated. 

Though their results were not stated as a path integral, the studies by js], 0, 0] capture 
much of the structure we discuss here. Their focus was on iterative determinations of the joint 
conditional probability density P(X|Y) and on maximizing that quantity. This seeks the 
mode of the distribution, while the path integral formulation allows evaluation of moments 
such as the expected value, covariances about that, and marginal distributions of state and 
parameter values such as we explore here. 

To utilize the general formulation one must make an assumption about the distribution 
of model errors in the transition probability P(x(?t, + l)|x(n)) wherein the model dynamics 
is placed. In the presence of observations, this assumption may or may not be correct, 
and one constructive way to examine the assumption is to compare moments of the model 
error MEa{n) = ga{y^{n + l),x(n)) or MEa{n) = Xa{n + 1) - /a(x(n)); a = 1,2, ..,D; n = 
0, 1, ... during a period of observations with one's assumed distribution. If the moments one 
evaluates are consistent with the assumed P(x(?2 + l)|x(n)), one's confidence in the overall 
data assimilation effort is supported. If not, another set of assumptions on how model errors 
are distributed must be examined. The required moments are themselves taken from the 
path integral, and consistency of the model errors is conditional on the observations and on 
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one's knowledge or assumption on their distribution. 

To illustrate these ideas, we examine 'data' in a twin experiment jsj using the D = 100 
Lorenz96 model joj. We add Gaussian noise to the 'observations'. Because we generate the 
data in a twin experiment and use the data assimilation procedure to estimate the states 
of the model from sparse noisy observations, this exercise shows the procedure to be self- 
consistent when we have assumed a Gaussian broadening of the deterministic transition 
probabilities when sufficient observations are presented to the model. 

We then examine another Lorenz96 model, this time with D = 20, and add noise dis- 
tributed according to the gamma distribution to both the observations and to the dynamical 
equations. Then, on assuming these measurement and stochastic model error noise terms are 
Gaussian for purposes of assimilating data into the model, we show the Gaussian assumption 
is inconsistent. 

In this paper we do not consider deterministic errors in model dynamics. As noted, 
the possibilities are too many, and we have no general or algorithmic manner in which to 
represent them. Perhaps one could examine a simple, overall bias term in this context, 
but we do not do so here. Further, if one finds, as we do in our second example, that the 
assumption about stochasticity in the data assimilation procedure is inconsistent with one's 
model and one's observed data, we do not propose a systematic remedy. 

2. General Formulation 

We begin by establishing a framework for assimilating information from observed data to 
a model of the observed system. To start we think of a physical system which is observed at 
times tn in the observation window {tQ,ti, ...,tm = T}, and at each of these times we make 
L observations yi(tn) = Viin)] / = 1,2, ...,L. Independently of making the observations, we 
develop a model of the physical system from considerations of the physical or biophysical 
processes selected to be important in the development of the dynamics. If the observed 
physical processes are described by partial differential equations, we discretize both space and 
time to arrive at a discrete time map for the D-dimensional model state vector a = 

1,2, D] n = 0,1, ...,m. This model has D degrees of freedom and consists of a rule taking 
the D-dimensional model state x(t„) = x(n) to the model state at time tn+i '■ Xa(tn+i) = 
Xa{n + 1) = /a(x(n), Pm)', a = 1,2, D where are time independent parameters in the 
model. 

The model should also specify how the state x(n) is related to the observations yi{n) 
through a set of L observation functions /i/(x(?7,), p^) where p^ are fixed parameters in the 
observation functions. The total collection of parameters we will call p = {p^^, Po}- 

The goal of data assimilation is to use the information in the y(n), where L < D in 
essentially all cases, and often L << D, to estimate both the parameters p and the D — L 
unobserved states. If one does this over the observation window to achieve an estimation of 
the p and x(tm = T), then one may use this information for characterizing the system state, 
at least according to the model, and for making predictions for t > T using the dynamics 
prescribed through the model. 

When the measurements Y(m) = {y(m),y(m — 1), y(l), y(0)} are noisy, and the 
model has errors arising from resolution or missing terms, that is, physical processes not 
properly incorporated in the model, then the description of the outcome of using the model 
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and of utilizing the information in the observations is in terms of a conditional probability 
distribution P(x(m)|Y(m)) for the probability of the state being at x(m) at the end of the 
observation window, given the sequence of measurements Y(m). 

In this stochastic setting such a probability distribution satisfies a Fokker-Planck equa- 
tion [10] with familiar drift and diffusive terms, and when the observation noise and the 
model errors are not Gaussian, it also has an infinite number of additional terms. Since the 
Fokker-Planck equation is linear in the conditional probability, one would expect to find an 
integral representation of the desired probability distribution at time tm = T in terms of the 
probability distribution at to : P(x(0)): 



m—1 



P(x(m)|Y(m)) = / Y[d^x{n) K{X{m),Y{m))P{^{0)), 



n=0 



with X(m) = {x(m), x(m — 1), x(l), x(0)}. 

Usin g id entities on conditional information (Bayes' rule) and the Chapman-Kolmogorov 
relation [l2|, true for Markov processes where x(n + 1) depends only on x(n), as here, we de- 
rived [l| a recursion relation between the conditional probability distribution P(x(n)|Y(r?,)) 
and P(x(n-l)|Y(n-l)): 



P(x(n)|Y(n)) 



P(x(n),y(n)|y(n-1)) 



P(x(n)|r(n - 1)) P(y(n)|Y(n - 1)) 
/dM«-l)P(xW|x(.-l),P(x(„-l)|Y(„-l), 
exp[CM/(x(n),y(n)|Y(n- 1)))] 

-l)P,x,„)|x(„-l))P(x(„-l)|Y(„-l)). 



(2) 



where the mutual information [13| between the D-dimensional state x(n) and the L-dimensional 
measurement y(n), conditional on earlier measurements Y(n — 1), is 



CMJ(x(n), y(n)|Y(n- 1)) = log 



P(x(n),yH|r(n-l)) 
P(x(n)|y(n - 1)) P(y(n)|Y(n - 1)) 



(3) 



As discussed by Fano [I3j this is to be thought of as a variable on the space of states 
x(n) and observations Yip). The average mutual information over this quantity has certain 
positivity properties 13, though the CMI itself can be positive or negative. 

Iterating this recursion relation from the end of the observation period at back to 
its beginning at Iq, we have the integral representation of the solution to the underlying 
Fokker-Planck equation as 



, m—1 



/lib J. Ill, 

Yl d^x(n)exp|j^ CM/(x(n), y(n)|Y(r2 - 1)) 
n=0 n=0 
m—1 

+ ^log[P(x(n + l)|xH)]+log[P(x(0))]}. (4) 



n=0 
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This identifies the kernel K(K{m),Y{m)) in Equation (Q. 

An arbitrary function F(X) on the {m+l)D dimensional path X through the observation 
window has an expectation value 

^[F(X)|Y(m)] = <F(X)> 

^ /nn=o^^x(n)F(X)exp[-Ao(X,Y)] 

/n™o«?^xH exp[-Ao(X,Y)] 
_ / (iX F(X) exp [-Ao (X, Y)] 

/ dX exp[-Ao{X,Y)] ' 

in which the "action" is 

m 

-Ao(X(m),Y(m)) = |^ CM/(x(n), yH|Y(n - 1)) 

n=0 
m— 1 



(5) 



J2 log[P(x(n + l)|x(n))] + log[P(x(0))] . (6) 



n=0 

To estimate the expected value of the model state during and at the end of the obser- 
vation window F(X.) = X^ or the covariance about this expected value < (Xq— < Xa > 
< Xjs >) >, with a, (3 including both the D model state components and the (m + 1) 
time indices, or any other < F(X.) > of interest, we must perform the indicated integration 
in what may well be a quite high dimensional space. In general this cannot be accomplished 
analytically when the dynamics describing the transition x(n) — )■ x(n + 1) is nonlinear, 
so approximations must be made to both the conditional mutual information term in the 
action and to the transition matrix P(x(n + l)|x(n)) term describing the model dynam- 
ics. With these approximations, one must still perform the (m + 1)D dimensional integrals 
approximately. 

If the model were without errors and the resolution of the state were perfect, the transition 
matrix would be 

P(x(n + l)|x(n)) = 5^(x(r2 + 1) - f (x^, p,,)) = 5^(g(xH, x(n + 1), p^)), (7) 
and in our examples below we use 

g(x(n), x(n + l),pj,,) = -x(n) + x(n + 1) + ^|F(x(n), pj^,) + F(x(n + 1), Pm)}, (8) 

where the ordinary differential equations for the model after spatial discretization is 

dx{t) 



dt 



F(x(t),PM), (9) 



and the time step in the time discretization is At. 

It is an unlikely situation to have no model errors except in problems where we prescribe 
the model, generate data using the model with a selected set of parameters and initial 
conditions x(0) and then utilize the path integral or other format for data assimilation to 
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estimate those parameters and state variables at the end of observations ^{tm = T). This 
kind of 'twin experiment' jsl is always helpful in testing data assimilation methods. 

We must make an approximation to P{'x{n + l)|x(n)). If we assume the delta function 
is broadened by Gaussian noise or errors in resolution we can replace 

D m-l 

d^i^in + 1) - f (x(n), p,,)) ^ i^f/' exph^ ^ 5^(x,(n + 1) - /.(xH))^], (10) 

a=l n=0 

which becomes the delta function as the inverse resolution Rf —> oo. We do not continue to 
explicitly show the dependence of the vector field f(x) on the parameters. There are many 
approximate forms of a delta function which become the delta function when the width of 
the approximate distribution goes to zero, and the Gaussian case, though in common use, is 
here as an illustration. 

To use the general result Equation ([5]) for numerical estimation of < -F(X) >, we also 
require an approximation for the conditional mutual information term in the action. Though 
not necessary, it is common to assume that the measurements yi{n) are independent of each 



other at time n and independent of measurements at earlier times (however, see 15|). In 



this setting one may represent the conditional mutual information terms in the action as 

1=1 n=0 

when the measurement errors are assumed to be distributed as a Gaussian with variance 

D-l 

With these approximations, the action becomes 

L m 

Ao{X{m),Y{m)) = -fY.Y.^yi{n)-h{^{n))f 

1=1 n=0 
j-j D m— 1 

a=l n=0 

- log[F(x(0)]. (12) 

Importantly, when the dynamics f (x) is nonlinear, this is not quadratic in the state variables, 
so the integral. Equation ([5]), requires numerical evaluation. 

The goal of this paper is to examine the consistency of assumptions such as Equation 
( ITOj) . If the Gaussian assumption in Equation ( ITOj) or any other specific assumption about 
the stochastic model error is to be consistent, then the distribution of the model error for 
a = 1,2, .., D; n = 0,1, ... 

MEa{n) = Xa{n + 1) - /a(x(n)) = ga{^{n),^{n + 1)) (13) 

resulting from performing the integral Equation ^ with F(X(m)) = MEa{n) should be 
the same as when evaluating the expectation of i^(X) directly from the assumption on 
P(x(n + l)|x(n)). 
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This means that < i^(X) > evaluated from Equation ([5]) should numerically be the same 

as 

/m—l 
dX(m) II F(x(n + l)|x(n)) F(X(m)). (14) 

71=0 

While demonstrating this would require comparing the quantities < -F(X) > and < F(X) >me 
for arbitrary functions or, perhaps for all moments assuming they exist, we cannot do that 
in practice, so we examine a much more limited set of comparisons. 

The reader will have realized that the equality of < F(X) > and < -F(X) >me is 
equivalent to 

m 

J2 CM/(x(n),y(n)|Y(n - 1)) = 0, (15) 

n=0 

and is essentially a test of how well the model, along with assumptions of representations of 
its stochastic errors, performs in matching the model output with the observations Y(m). 

This is also essentially the same as saying the model output hi{'x{n)) at each measurement 
time tn is synchronized to the measurements yi{n) at those times. In the case of independent 
measurements with a Gaussian distribution of measurement errors Equation ffTOj) . it also 
represents maximal information transfer from the observed data to the model. In this spirit, 
under the assumption that noise in the measurements is additive, we could also examine 
the question of whether all aspects of the observation error OEi{n) = yi{n) — hi{-x{n)); I = 
1,2,.. .,L; n = 0,1,..., m are such that < F{OEi{n)) >= F{0). In the remainder of this 
paper, we focus on considerations of the stochastic model error, recognizing its equivalence 
to the accuracy with which observation errors are essentially zero. 



3. A First Example from the Lorenz96 Model with D = 100 



The 1996 model of Lorenz j9|| has D degrees of freedom Wa{t); a = 1, 2, D in a periodic 
sequence satisfying the differential equations 

= Wa^i{t){Wa+l{t) - Wa-2{t)) - Wa{t) + f, (16) 

with w_i{t) = wr)-iii),Wo{t) = wnif), and WEi+i{t) = Wi{t), with some choice of Wa{0). The 
'forcing constant' / is a control parameter for the bifurcations of the solutions Wa{t) of these 
equations, and when f ^ 8 or larger, the solutions exhibit chaotic behavior. 

In this example, we selected D = 100 and / = 8.17 and generated a set of 'data' yi{tn) for 
tn+i — tn = 0.01 and L = 40. L = 40 was chosen from considerations of the smoothness of the 
action for this problem as a function of the number of measurements or equivalently from 
the number of positive conditional Lyapunov exponents on the synchronization manifold 
xi(n) ^ yi{n) which, for the Lorenz96 problem, was determined tohe L ^ O.AD (l6l . ItI 18 |. 



The measurement function is the unit operator in the present example /i/(x(n)) = Xi{n) 
We used the Gaussian assumption of stochastic noise to approximate the action in the data 
assimilation path integral, so we adopted Equation f|T2|) for the evaluation of < -F(X) > for 
functions on the path i^(X). 
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We generated 'data' solving Equation (fT6|) for a time step At = 0.01 and evaluated 
yi{n); / = 0, 1, D — 1 for tn+i —tn = 0.05, namely, every fifth time step in the development 
of the dynamics yields observations y(n). Data for 

I = {0, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 

47, 50, 52, 55, 57, 60, 62, 65, 67, 70, 72, 75, 77, 80, 82, 85, 87, 90, 92, 95, 97} 

were presented to the model for an observation window < t < 4. No observations were 
presented in a prediction window [4,6]. The integrals involved were of dimension (m + 
1)D = 60100, and we evaluated them approximately using a standard Metropolis-Hastings 



Monte Carlo method [18|, Il9|, |20| . In this method we selected starting paths at random and 
performed 35,000 initialization path selections before recording statistics on < i^(X) > for 
another 310,000 paths. The recorded paths were broken into blocks of 100 with 3100 blocks 
evaluated to approximate the integrals. These calculations were performed on a standard 
2.5 GHz CPU. They were subsequently performed on a NVIDIA GPU with a speedup of 
about 75-100 in each case. 

In the calculations we averaged over each block of 100 paths to achieve N^p = 3100 
accepted paths X'--'^ of length 60101 each, including the parameter /. With these paths we 
approximated < F(X.) > as 

. Nap 

<F(X)>^-— 5^F(X(^)), (17) 

as the accepted paths were distributed according to exp[— Ao(X, Y)] by the Metropolis- 
Hastings procedure. The expected error in this approximation to < -F(X) > is a few parts 
in 10-2. 

We evaluated the expected values of F(X) from the set {xa{n)'^, MEa{n)} for q = 1, 2, 3, 4 
allowing us to estimate the expected mean path, the RMS variation about that path as well 
as the skewness and kurtosis about that mean. The latter allows us to examine whether the 
common assumption that the integrals involved are approximately Gaussian is correct. From 
the collection of accepted paths we are also able to estimate the marginal distributions of 
any element of the path, and in particular we were interested in the distribution of MEa{n). 
If the assumption made in formulating the action Equation (fT2|) that the stochastic model 
errors are distributed as a Gaussian, then we expect the mean of MEa{n) for any index a 
and any time n to be zero. We expect the RMS error variation about this mean to satisfy 
RMS{MEa{n))^^^^/Rf = 1 and the distribution P[MEa{n)) for any a and n to be Gaussian. 

When we generated our 'data' by solving the Lorenz96 D = 100 equations, we added 
Gaussian noise with a signal to noise ratio of about 23 dB to the clean signal. This translates 
to Rm ~ 8.0, and we used that value in our Monte Carlo integrations. We also selected 



Rf = 100 as our experience with these methods [18[ suggests that Rf ^ 10i?m gives a 



sufficiently large Rf that the imposition of the approximate equations of motion is accurate. 

As examples of the outcome of these calculations, we report that for MEyq{81) (chosen 
at random) the mean value was 9.9 x 10"^, RM S (M Ejq{81)) ^/Rj = 0.97, and the skewness 
and kurtosis of this variable were smaller than 0.01 in magnitude. 

Figure ([1]) Left shows the distribution of values of MEfQ{81) over the 3100 paths along 
with the best fit Gaussian to that distribution. As RMS{MEiq{81)) \/Rf = 0.97, it is clear 
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to the eye that this distribution of this model error is consistent with the assumed stochastic 
model error distribution. Figure ([T]) Right also shows the distribution of M£'47(60) from 
the same set of calculations compared again to the best fit Gaussian distribution. In this 
case the mean value of M£;47(60) was 8.8 x 10"^ and RMS{ME47{Q0)) ^/R} = 0.9899; 
again the skewness and kurtosis are quite small as would be expected for a nearly Gaussian 
distribution. We can say for this case also that the output distribution of stochastic model 
errors is consistent with the assumed distribution. 

In Figure (2) Left we show the expected value of the observed variable X47(t) and its 
RMS error through the observation window [0,4] and into the prediction region [4,6]. In 
Figure (2) Right we show the expected value of the unobserved variable X7Q{t) and its RMS 
error through the observation window [0,4] and into the prediction region [4,6]. Figure (3) 
shows the skewness and kurtosis for the observed variable X4i{t) through the window [0,6]. 
We see that the skewness and kurtosis are small in the observation window and then grow 
substantially after observations are terminated. This is consistent with the chaotic orbits of 
the model. Figure (4) displays the skewness and kurtosis of the unobserved variable x^eit), 
and we see that the values both within and without the observation window are larger than 
for the observed variable. 

We noted above that the consistency of the assumed and the calculated distribution is 
equivalent to the precision of the equahty of model output as estimated through the path 
integral and the observations presented to the model. 



4. A Second Example from the Lorenz96 Model with D = 20 

For a second example we again use the Lorenz96 model, Equation fll6p but now with D = 
20. We selected / = 7.93; again a value leading to chaotic orbits. We introduced noise into 
the dynamics and added noise to the observations presented to the model yi{n) for L = 8 
and for 

/ = {0,2,5,7,9,11,17,18}. (18) 



The noise was taken from the Gamma distribution 21 1 



Prix) = (19) 
r(a) 

and added to each component of the model and to the 'data' generated by the model as 

scale(Pr(a;) -a), (20) 



noting that 



dx Pr{x) X = a. (21) 







We selected a = 7 and scale = 0.05 in the dynamical equations and scale = 0.205 in the 
additive noise in the data yi{n). Again we chose Rm = 8 and Rf = 100 using m = 410 with 
At = 0.02 in the integration to produce the data. The path integrals were thus of dimension 
8220, and we used 25,000 initialization Monte Carlo accept/eject steps to begin followed by 
271,000 steps where statistics were recorded. 
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Selecting, again randomly, the model error ME^ldl), we display the distribution of this 
model error in Figure (E]) Right along with a best fit Gaussian distribution. For this model 
error term, the mean value was -0.064, while i?MS'(Mi?8 (91)) = 0.609 which is quite 
different from the unity required for consistency with the assumption of Gaussian broadening 
of the deterministic P(x(n + l)|x(n)) assumed in the path integral. One can see from 
Figure (|5]) that the computed distribution P{MEg{91)) is significantly narrower than a 
Gaussian. The same calculation but for the model error term M£'ii(104) is shown in Figure 

Left again along with a best fit Gaussian. For this distribution the mean was 7 x 10~^ 
and RMS{MEii{104:))^,/Rf = 0.64. One can conclude that the assumption of Gaussian 
broadening of the transition probability used in the path integral and of Gaussian additive 
noise using in the conditional mutual information term of the action are not consistent with 
the data and the noisy model. 

If we examine Figure (jS]) where the expected value of the observed model variable xn{t) 
through the observation window [0, 8.2] is displayed along with the calculated RMS variation 
about that expected value and with the observed noisy 'data' points presented to the model, 
we see sizeable regions where the estimated value < xn > {t) deviates from the observations. 
In Figure ([7]) one sees very large values of the skewness and kurtosis within the observation 
window for the observed model variable Xij{t) showing the departure of its distribution 
from a Gaussian. Figure (jH]) shows the same features for the unobserved model dynamical 
variable xu(t). 

Certainly more precise statistical tests can be made to determine the deviation of the 
distribution of the model errors selected here from Gaussians. As Gaussians were assumed 
in formulating the path integral as described above, we can conclude with confidence that 
the Gaussian assumption is inconsistent with the data. Of course, we built this into our 
'twin experiment' calculation, so it is perhaps reassuring that we are able to detect this 
inconsistency with essentially no more effort than already required in evaluating the data 
assimilation path integral for our other purposes: estimation of parameters and states within 
and at the end of an observation period, prediction of the estimated states and their RMS 
errors beyond the observation window, .... Once we have the accepted paths X'--'^ evaluation 
of any F(X''-'^) is quite straightforward. 

In the more interesting situation of data from field or laboratory observations, we may 
make the same calculations of our assumed stochastic model error terms and check equally 
straightforwardly for the consistency of these assumptions. 

5. Discussion 

In assimilating information from measurements to a model of the observed system when 
the data are noisy, the models have error, and the state of the model system uncertain when 
measurements begin, one must make assumptions both about the way to represent stochastic 
model errors and noisy information transfers from the data. The latter are often known 
through knowledge of the sensors and the environmental noise during measurements. Model 
errors can be structural and deterministic arising from physical processes unaccounted for in 
developing the model or they can be stochastic representing limits on the spatial or temporal 
resolution of the model or environmental noise representing fluctuations on any scale not 
dynamically treated in the model. The stochastic model errors broaden the manner in which 
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the dynamics enters data assimilation. In the deterministic case the transition probabihty 
to go from x(t„) = x(n) to x(t„+i) = x(n + 1) is P(x(n + l)|x(n)) = 5-^(x(n + 1) — f (x(n))). 
This is broadened in the stochastic data assimilation ( or ensemble data assimilation) task, 
and an assumption on how this is represented must be made. 

By comparing properties of the stochastic model errors MEa{n) = Xa{n + 1) — fa{^{n)) 
in the assumed distribution for P(x(r2 + l)|x(n)) and the properties emerging from the 
data assimilation procedure, we can test for the consistency of the assumptions about the 
stochastic model errors. 

Using two variants of the Lorenz96 model, we examined a case where there was demon- 
strable consistency of the outcome of data assimilation and assumptions about the distri- 
bution of model errors, and we reported on another example where there was not such 
consistency. If the distribution of model errors is consistent between these two situations, 
this provides confidence in the precise formulation of the data assimilation tasks. Similarly, 
when that consistency is absent, confidence is lost. We do not provide a remedy in the case 
of inconsistency. 

To execute this consistency test one requires basically the same numerical evaluations as 
in performing the overall data assimilation effort using the path integral formulation of the 
problem so carrying out the consistency check is computationally quite inexpensive. 

We do not give a remedy to finding inconsistency in the assumed and determined stochas- 
tic model error. However, in the dynamical parameter and state estimation procedure [l6| 




we found that adding a control term u(t){y{t) — x(t)) to the dynamics and a quadratic term 
in the 'control' u{t) to the equivalent of the action, we could extract indications of where 
in time model errors might occur and require compensation by the external forcing u{t). 
We have not pursued the possibility here, but it seems attractive to consider adding such 
controls to both the model error terms and the mutual information terms of the action to 
trace the locations of errors in the model in phase space, though the temporal location of a 
requirement for large u{t). If discretized space is part of the physical setting, that too could 
be traced in this manner. 
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Figure 1: Left Distribution of the stochastic model error M£'76(81) from the D = 100 Lorenz96 model 
compared to a best fit Gaussian distribution. Right Distribution of the stochastic model error M£'47(60) 
from the D — 100 Lorenz96 model compared to a best fit Gaussian distribution. 
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Figure 2: Left Using observed data (blue circles) in the observation window [0,4], the expected value of the 
(observed) variable X47{t) (black line) was estimated using the path integral discussed in the text. The RMS 
error (red error bars) around this expected value was also estimated. Using the estimated value of all 100 
state variables (40 observed and 60 unobserved) along with the estimated parameter, predictions including 
RMS errors were made into [4, 6] as shown. Since the Lorenz96 D = 100 model is chaotic for the chosen 
forcing, the error bars grow in the prediction interval. Right Using observed data in the observation window 
[0, 4], the expected value of the (unobserved) variable a;7g(t) (black line) was estimated using the path integral 
discussed in the text. The RMS error (red error bars) around this expected value was also estimated. Using 
the estimated value of all 100 state variables (40 observed and 60 unobserved) along with the estimated 
parameter, predictions including RMS errors were made into [4, 6] as shown. Since the Lorenz96 D = 100 
model is chaotic for the chosen forcing, the error bars grow in the prediction interval. 
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Figure 3: Left Skewness of the state variable 3:47 (i), one of the observed quantities presented to the model 
though the data assimilation path integral. During the observation period the skewness remains small 
suggesting the state distribution may be approximately Gaussian in this interval. The skewness grows 
rapidly when observations no longer are available to guide the trajectory of the model and it moves rapidly 
to its attractor which is comprised of points in 100 dimensional state space which are not distributed as 
a Gaussian. Right Kurtosis of the state variable 0:47 (i), one of the observed quantities presented to the 
model though the data assimilation path integral. During the observation period the kurtosis remains small 
suggesting the state distribution may be approximately Gaussian in this interval. The kurtosis grows rapidly 
when observations no longer are available to guide the trajectory of the model and it moves rapidly to its 
attractor which is comprised of points in 100 dimensional state space which are not distributed as a Gaussian. 
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Figure 4: Left Skewness of the state variable X7e{t), one of the unobserved quantities not presented to 
the model though the data assimilation path integral. During the observation period the skewness remains 
small suggesting the state distribution may be approximately Gaussian in this interval. The skewness grows 
rapidly when observations no longer are available to guide the trajectory of the model and it moves rapidly 
to its attractor which is comprised of points in 100 dimensional state space which are not distributed as a 
Gaussian. Right Kurtosis of the state variable xrQ{t), one of the unobserved quantities not presented to 
the model though the data assimilation path integral. During the observation period the kurtosis remains 
small suggesting the state distribution may be approximately Gaussian in this interval. The kurtosis grows 
rapidly when observations no longer are available to guide the trajectory of the model and it moves rapidly 
to its attractor which is comprised of points in 100 dimensional state space which are not distributed as a 
Gaussian. 



Lorenz96 D = 20 



LorenzSe D = 20 




-0.1 0.0 0.1 

ME,,(104) 



0.2 0.3 




Figure 5: Left Distribution of the stochastic model error Mi?ii(104) from the D = 20 Lorenz96 model 
compared to a best fit Gaussian distribution. Right Distribution of the stochastic model error MEs{91) 
from the D = 20 Lorenz96 model compared to a best fit Gaussian distribution. 
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Lorenz96 D = 20 
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Figure 6: Using observed data (blue triangles) in the observation window [0, 8.2], the expected value of the 
(observed) variable xij{t) (black line) was estimated using the path integral discussed in the text. The RMS 
error (red error bars) around this expected value was also estimated. 
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Figure 7: Left Skewness of the state variable xn{t), one of the observed quantities presented to the model 
though the data assimilation path integral. Right Kurtosis of the state variable xir{t), one of the observed 
quantities presented to the model though the data assimilation path integral. 
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Figure 8: Left Skewness of the state variable xi4{t), one of the unobserved quantities not presented to the 
model though the data assimilation path integral. Right Kurtosis of the state variable Xi4{t), one of the 
unobserved quantities not presented to the model though the data assimilation path integral. 
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